

**Engineering and Technology** 

Check for updates

# High-speed and area-efficient scalable *N*-bit digital comparator

ISSN 1751-858X Received on 11th November 2018 Revised 17th December 2019 Accepted on 24th January 2020 E-First on 10th March 2020 doi: 10.1049/iet-cds.2018.5562 www.ietdl.org

Piyush Tyagi¹, Rishikesh Pandey¹ ⊠

<sup>1</sup>Department of Electronics and Communication Engineering, Thapar Institute of Engineering and Technology, Patiala, India ☑ E-mail: riship23@gmail.com

**Abstract:** An area-efficient *N*-bit digital comparator with high operating speed and low-power dissipation is presented in this work. The proposed comparator structure consists of two separate modules. The first module is the comparison evaluation module (CEM) and the second module is the final module (FM). Independent from the input operand bitwidths, stages present in CEM involve the regular structure of repeated logic cells used for implementing parallel prefix tree structure. The FM validates the final comparison based on results obtained from the CEM. The presence of regular very large-scale integration topology in the proposed structure allows the analytical derivation of the area in terms of total number of transistors present in the design and total delay encountered in input–output flow as the function of input operand bitwidth. Spectre simulation results have been presented using 0.18 µm complementary metal–oxide–semiconductor (CMOS) technology at 1 GHz. The main advantages of the proposed comparator are minimum input–output delay of 0.57 ns, minimum fan-out-of-4 delay of 9.5 ns and low-power dissipation of 1.03 mw as compared with existing comparators designed using 180 nm CMOS technology for 64 bit comparison.

### 1 Introduction

Digital comparator is the fundamental design element used for the applications, in which the final results are based on the output obtained from the computation involving comparison as an activity. There are wide range of applications, which involve scientific computations (digital image processing, pattern recognition/matching, arithmetic sorting, data compression and digital neural network [1–3]) and test circuit applications (built-in self-test circuits, signature analysers and jitter measurement [4–5]) consisting of comparator as the basic design element.

The optimised design of comparator is used as the key component in the general-purpose computer architecture for developing the memory addressing logic, queue buffers, test circuits etc. [6-8]. Extensive use of comparator logic in various computation-based designs necessitates optimisation in terms of area, power and speed. Some of the comparator designs use dynamic logic to achieve low-power consumption but limitations of low-speed and poor-noise margin make the dynamic design rather challenging. The other designs use subtractors in the form of flat adder components along with custom logic circuits [9-13] to implement comparison process for wider bit operands but these designs give slower response and area intensive arrangement [14-16]. The improvement in the scalability and reduction in the comparison delay has been achieved in hierarchical prefix tree structure-based comparator that composed of 2 bit comparators at each level [17]. However, for the wide input operands, these structures maybe prohibitive due to prolonged delay and power consumption arising from log<sub>2</sub> N comparison levels. Improvement in some of the limiting factors of the parallel prefix tree structure such as area and power consumption can be achieved by using two input multiplexers at each level and generate-propagate logic at the first level [18]. However, the comparator structure has very highpower consumption since every cell remains in active state irrespective of the applied operand values.

Several comparators based on pipelining and power-down approaches [19] have been reported for speed improvement and power consumption reduction [20, 21]. The comparator design based on all-*N*-transistor dynamic complementary metal—oxide—semiconductor (CMOS) logic has been reported to compensate for high fan-in through high pipeline throughput [22]. An alternate structure that uses priority encoder-based magnitude-decision logic

for improving operating speed has been proposed in [23]. This structure involves two pipelined operations that are synchronised with the rising and falling edges of the clock signal to eliminate long dynamic logic chain for delay improvement. However, heavyloaded clock signal further imposes limitations on the clock speed and jitter margin, which makes the design unsuitable for widerange comparison. The comparator for large operand bitwidths is reported in [24], which comprises of two comparator stages. The first stage performs the 8 bit comparisons, then subsequently results from the first stage transferred to the priority encoder and 8to-1 multiplexer present in the second stage for the selection of the appropriate result obtained from the first stage. The two-phase domino clocking [25, 26] is utilised in the comparator so that twostage operations could be performed in the single clock cycle for facilitating the operations to be synchronised with the rising and falling edges of the clock signal. This further limits jitter margin and operating speed, and therefore, the comparator becomes sensitive to the race conditions [27]. Another comparator structure proposed in [28] for enhancing operating speed using a combination of two-phase domino clocking structure and tree structure. In the structure, the carry-out signal is used as the indicator for 'greater-than' or 'less-than' outputs. However, the heavy loading of the clock signal present in the circuit remains the bottleneck of the design, and therefore, large drivers are required for the clock signal. Some of the comparator structures improved power efficiency through the removal of dynamically redundant computations using ripple-based structures [29-31].

Similarly, most of the structures include compute-on-demand comparators that focus on the reduction of switching activities for achieving energy-efficient design [32–34]. However, these structures experience a prolonged delay in the worst-case scenario when the wide operands are considered for the comparison. To reduce the delay and power consumption due to the addition of ripple-based computations in the design, a comparison scheme based on bitwise competition logic has been proposed [35]. The pre-encoder structure in this approach limits the operating frequency and increases power consumption.

A parallel binary comparator reported in [36] uses regular digital hardware structure independent of input bitwidths but its area and power dissipation are high. To eliminate the limitations of the previous comparator structures, some designs are proposed, which leverage the two-level approach for comparison [37].



Fig. 1 Comparison between two 16 bit operands



Fig. 2 Comparison between two N-bit operands



Fig. 3 Flowchart of the proposed N-bit digital comparator

Table 1 Used symbols and their descriptions

| Table I  | 1 Osed Symbols and their descriptions        |  |  |  |
|----------|----------------------------------------------|--|--|--|
| Symbol   | Description                                  |  |  |  |
| N        | operand bitwidth                             |  |  |  |
| Α        | first input operand                          |  |  |  |
| В        | second input operand                         |  |  |  |
| П        | bitwise AND                                  |  |  |  |
| Σ        | bitwise OR                                   |  |  |  |
| COMP     | complement function                          |  |  |  |
| AGB      | A is greater than B (in terms of magnitude)  |  |  |  |
| ALB      | A is lesser than B (in terms of magnitude)   |  |  |  |
| AEB      | A is equal to B (in terms of magnitude)      |  |  |  |
| E        | representation of equal bitwise comparison   |  |  |  |
| X        | representation of unequal bitwise comparison |  |  |  |
| <b>O</b> | bitwise EX-NOR operation                     |  |  |  |
| $\oplus$ | bitwise EX-OR operation                      |  |  |  |
| CEM      | comparison evaluation module                 |  |  |  |
| FM       | final module                                 |  |  |  |

However, due to the presence of chain architecture containing a series of transistors further limits the operating speed of the comparator. Other binary comparators based on single bus structure have been reported in [38, 39] but these comparators have highpower dissipation and low operating speed. Also, the reported structures use a large area in terms of number of transistors.

To eliminate the limitations of previous comparator designs, which majorly include multicycle-based computation, enormous power consumption, uneven geometry size of transistors and area consumption, we propose a scalable *N*-bit digital comparator focused for the optimisation of speed, power consumption and area in terms of number of transistors. The proposed digital comparator uses a novel exclusive-OR-NOR (EX-OR-NOR) cell, which improves the performance in terms of area, power consumption and operating speed. This paper is organised as follows. Sections 2 and 3 depict the design methodology and circuit description of the proposed *N*-bit digital comparator, respectively. The estimations of area, power consumption and operating speed of the proposed comparator for the operands ranging from 4 to 64 bit inputs are discussed in Section 4. Simulation results are presented in Section 5. Finally, the conclusion is addressed in Section 6.

# 2 Design methodology of the proposed *N*-bit digital comparator

The working principle of conventional comparison is shown in Fig. 1, where the operands A and B have unequal most significant bit (MSB) bits. Since the first unequal bits of operands A and B encountered is well-sufficient to decide the outcome of the comparison between the two operands, remaining bit positions are ignored for comparison.

The comparison process used for comparing N-bit operands starts comparison from (N-1)th bit (or MSB bit) and proceeds toward the comparison of (N-2)th bit (or least significant bit (LSB)) if and only if the MSB bits of the two operands are equal.

As shown in Fig. 2, the comparison process continues to compare the bit pairs obtained from the operands until it gets an unequal pair of bits on its way toward the LSB bit position. The unequal bit pair (X) and equal bit pair (E) are realised as

$$X = A \oplus B \tag{1}$$

$$E = A \odot B \tag{2}$$

The flowchart of the algorithm used for the implementation of the proposed *N*-bit digital binary comparator is shown in Fig. 3. The symbols and their descriptions used in the proposed design are listed in Table 1. The two *N*-bit input operands A and B are selected for the comparison and are checked if the operands are equal or not equal by performing the bitwise comparison. If the result of comparison comes out as 'equal', then the proposed comparator drives the output logic AEB to logic 1. If the comparison result of the operands comes out as 'unequal', then the pre-encoder output bits are checked from MSB to LSB. The output logic AGB or ALB goes to logic 1 based on the results of pre-encoder. The proposed algorithm reduces the superfluous switching activities occurred during comparison operation, which further limits the dynamic power consumption of the proposed comparator.

# 3 Circuit description of the proposed *N*-bit digital comparator

The proposed *N*-bit digital comparator is shown in Fig. 4. For performing a comparison between two *N*-bit binary operands, the proposed structure is divided into the comparison evaluation module (CEM) and final module (FM). These modules serve as a high-level and low-level architectures. The CEM incorporates parallel prefix tree structure that is intended for performing a bitwise comparison of two *N*-bit operands A and B depicted by  $A_N = A_N = A_$ 

The complete process of comparison is divided into five sets, in which CEM contains sets 1–4 and FM contains only set 5. All the sets in the design are placed in four hierarchal prefix orders



**Fig. 4** Proposed N-bit digital comparator



Fig. 5 Novel EX-OR-NOR cell



Fig. 6 Transient response of EX-OR-NOR cell

according to their functionality; therefore, the output of each set in this approach serves as the input of another set with an exclusion of set 1, whose outputs act as the inputs of sets 2 and 3.

In set 1, bitwise comparison of two N-bit binary operands is carried out by the novel EX-OR-NOR cell. The proposed structure of EX-OR-NOR cell shown in Fig. 5 is based on the pass transistor logic and CMOS logic. It uses seven transistors for EX-OR and EX-NOR operations as compared with the conventional eight transistors model [37]. The transistor M5 is used to obtain full output voltage swing of EX-NOR operation as shown in Fig. 6. The six transistors model has also been reported in [40] but it gives limited output voltage swing when applied input operands are (0, 0) or (1, 1). Optimum aspect ratios of the seven transistors (M1-M7) consisting of four P-channel MOS (PMOS) and three Nchannel MOS transistors of the proposed EX-OR-NOR cell are carried out to avoid the universal drive constraint faced by the pass transistor logic. The novel structure uses a PMOS transistor in the feedback to maintain the logic level on the EX-NOR output terminal and the CMOS logic to boost up the output for achieving the full voltage swing on the EX-OR output terminal.

The outputs of novel EX-OR-NOR cells provide the termination and comparison bits intended for sets 2 and 3 structures.

The operation of the novel EX-OR-NOR cell is described as

set 1: 
$$T_K = A_K \odot B_K$$
 (3)

$$set 1: D_K = A_K \oplus B_K \tag{4}$$

where  $T_K$  indicates equal bit pair,  $D_K$  indicates unequal bit pair of operands A and B and K is an integer, which varies in the range of  $0 \le K \le N-1$ .

Set 2 comprises of cells, which operates on the termination bits  $(T_K)$  obtained from set 1. The logic cells present in set 2 combine the termination bits obtained from the nibble partitions (partition used for the comparison of every 4 bit of the operands starting from the MSB) present in set 1 and the outputs obtained from the preceding AND-type logic cells present in the same level of set 2.

Equal flags  $E_{[(N/4)-1]}$  to  $E_0$  generated from set 2 control the switching activities of the next subsequent partitions of set 3.



Fig. 7 16 bit comparison using the proposed N-bit digital comparator

Comparison request from set 2 generates if and only if all the results obtained from the bitwise comparison performed by preceding cells of set 1 are 'equal'; otherwise, termination bits as logic 0 will be generated. The operation of set 2 is expressed as

set 2: 
$$E_{m-1} = \prod_{m=1}^{(N/4)-1} T_{4m+3} T_{4m+2} T_{4m+1} T_{4m} E_m$$
 (5)

set 2: AEB (when 
$$m = 0$$
) =  $\prod T_3 T_2 T_1 T_0 E_0$  (6)

where  $E_{m-1}$ , for m=1 to [(N/4)-1] represent the equal flags of set

Set 3 includes cells, which combine the outputs obtained from sets 1 and 2. The number of inputs increases in the ascending order from left to right for each cell in their respective partition and ending with the maximum fan-in of six. The combination of sets 1 and 3 architectures forms the pre-encoder structure. If most significant unequal bits are received in the comparison process of two operands, then the output bits obtained from sets 1 and 2 allow the termination of the subsequent bitwise comparison activity of the logic cells present in set 3. Computation process of the cells present in each partition of set 3 can be written as

$$C_{m,1} = \text{COMP}\left(\prod_{m=0}^{(N/4)-1} E_m A_{4m+3} D_{4m+3}\right)$$
 (7)

$$C_{m,2} = \text{COMP}\left(\prod_{m=0}^{(N/4)-1} E_m A_{4m+2} D_{4m+2} T_{4m+3}\right)$$
(8)

$$C_{m,3} = \text{COMP}\left(\prod_{m=0}^{(N/4)-1} E_m A_{4m+1} D_{4m+1} T_{4m+3} T_{4m+2}\right)$$
(9)

$$C_{m,4} = \text{COMP}\left(\prod_{m=0}^{(N/4)-1} E_m A_{4m} D_{4m} T_{4m+3} T_{4m+2} T_{4m+1}\right)$$
(10)

where  $C_{m,1}$ ,  $C_{m,2}$ ,  $C_{m,3}$  and  $C_{m,4}$  {for m = [(N/4)-1] to 0} represent outputs of NAND-type logic cells for the mth partition of set 3.

Set 4 contains NAND-type logic cells, which receive the inputs from set 3 and set 4 requires (N/4) cells to combine the outputs from each partition of set 3. The complete operation can be written as

set 4: 
$$G_m = \text{COMP}\left(\prod_{m=0}^{(N/4)-1} C_{m,1} C_{m,2} C_{m,3} C_{m,4}\right)$$
 (11)

where  $G_m$  {for m = [(N/4)-1] to 0} represent the outputs of the mth logic cell.

Set 5 contains two NOR-type logic cells to decide the final results of the proposed digital comparator in terms of 'ALB' and 'AGB'. First NOR gate uses outputs of set 4 and 'AEB' as inputs to decide 'ALB', whereas second NOR gate uses the output of first NOR gate and 'AEB' as inputs to decide 'AGB'.

The computation process of set 5 is given by

set 5: ALB = COMP 
$$\left(\sum G_{(N/4)-1}...G_0(AEB)\right)$$
 (12)

set 5: AGB = COMP 
$$\left(\sum (ALB)(AEB)\right)$$
 (13)

'1101,1111,1111,1111'. Set 4 combines four nibbles obtained from the four partitions of set 3 into 4 bit data as '1000'.

Finally, set 5 acquires the 4 bit input pattern from set 4 and output bit 'AEB' from set 2 to give the final decision. Since A is greater than B, the proposed comparator structure provides the outputs AGB = '1', ALB = '0' and AEB = '0'.

# 4 Area, power consumption and operating speed estimations

The estimation of the required area, power consumption and operating speed of the proposed *N*-bit digital comparator is presented in this section.

## 4.1 Required area analysis

Area analysis of the proposed comparator is performed by estimating the total number of cells required in the different sets, and then the logic cells count translated into the total number of

**Table 2** Total number of logic cells used in the proposed comparator in CEM

| Bitwidth, bit | Number of EX-OR-NOR cells used in set 1 | Number of AND-type logic cells used in set 2 | Number of NAND-type logic cells used in set 3 | Number of NAND-type logic cells used in set 4 |
|---------------|-----------------------------------------|----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
| 16            | 16                                      | 3                                            | 16                                            | 4                                             |
| 24            | 24                                      | 5                                            | 24                                            | 6                                             |
| 32            | 32                                      | 7                                            | 32                                            | 8                                             |
| 64            | 64                                      | 15                                           | 64                                            | 16                                            |
| 128           | 128                                     | 31                                           | 128                                           | 32                                            |

**Table 3** Total number of transistors for different comparator bitwidths in CEM

| Bitwidth, bit | Number of transistors used in set 1 | Number of transistors used in set 2 | Number of transistors used in set 3 | Number of transistors used in set 4 | Total number of<br>transistors |
|---------------|-------------------------------------|-------------------------------------|-------------------------------------|-------------------------------------|--------------------------------|
| 16            | 16 × 7                              | 3 × 12                              | 16 × 12                             | 4 × 8                               | 372                            |
| 24            | 24 × 7                              | 5 × 12                              | 24 × 12                             | 6 × 8                               | 564                            |
| 32            | 32 × 7                              | 7 × 12                              | 32 × 12                             | 8 × 8                               | 756                            |
| 64            | 64 × 7                              | 15 × 12                             | 64 × 12                             | 16 × 8                              | 1524                           |
| 128           | 128 × 7                             | 31 × 12                             | 128 × 12                            | 32 × 8                              | 3060                           |
| 256           | 256 × 7                             | 63 × 12                             | 256 × 12                            | 64 × 8                              | 6132                           |
| 512           | 512 × 7                             | 127 × 12                            | 512 × 12                            | 128 × 8                             | 12,276                         |



Fig. 8 Total number of active transistors of the proposed comparator for different bitwidths



Fig. 9 Total number of transistors used for the proposed comparator and different comparators reported in the literature (64 bit)

transistors. Using (3)–(13), the total required logic cells of CEM ( $C_{\rm CEM}$ ) and FM ( $C_{\rm FM}$ ) is illustrated by (14) and (15), respectively as

$$C_{\text{CEM}} = (N \times (\text{set 1 cell})) + \left(\frac{N}{4} \times (\text{set 2 cell})\right) + (N \times (\text{set 3 cell})) + \left(\frac{N}{4} \times (\text{set 4 cell})\right)$$
(14)

$$C_{\rm FM} = (2 \times (\text{set 5 cell})) \tag{15}$$

**Table 4** Worst-case operands for different bitwidths

| Table 4   | vvorst-case operands for different bitwidths |
|-----------|----------------------------------------------|
| Bitwidth, | bit Worst-case operands                      |
| 4         | A = 0000 and $B = 0000$                      |
|           | A = 0001 and $B = 0000$                      |
|           | A = 0000 and $B = 0001$                      |
|           | A = 0001 and $B = 0001$                      |
| 8         | A = 000000000 and $B = 000000000$            |
|           | A = 00000001 and $B = 00000000$              |
|           | A = 000000000 and $B = 00000001$             |
|           | A = 00000001 and $B = 00000001$              |
| 16        | A = 00000000 and $B = 00000000$              |
|           | A = 00000001 and $B = 00000000$              |
|           | A = 00000000 and $B = 00000001$              |
|           | A = 00000001 and $B = 00000001$              |
| 32        | A = 00000000 and $B = 00000000$              |
|           | A = 00000001 and $B = 00000000$              |
|           | A = 00000000 and $B = 00000001$              |
|           | A = 00000001 and $B = 00000001$              |
| 64        | A = 00000000 and $B = 00000000$              |
|           | A = 00000001 and $B = 00000000$              |
|           | A = 00000000 and $B = 00000001$              |
|           | A = 00000001 and $B = 00000001$              |
|           |                                              |

The total number of cells and transistors required for different bitwidths in CEM are listed in Tables 2 and 3, respectively.

#### 4.2 Power consumption

In most of the digital circuits, power dissipation arises due to dynamic switching activity in the design. As a result, minimising the switching activities is the vital key for reduction of overall average power dissipation of the modern low-power designs. Therefore, the switching activities have been minimised in the proposed structure of digital comparator using termination bits for subsequent computations.

The power-saving capability of the proposed comparator is discussed based on switching activities of logic cells of each set:

- In set 1, power saving is not achieved since the input operands simultaneously excite all the logic cells present in set 1.
- In set 2, the logic cell, which operates on the termination bits obtained from the most significant nibble partition of set 1 always remains active. The subsequent logic cells activation depend on obtained termination bits from nibble partitions of set

1 and preceding AND-type logic cells output from the same level at set 2.

- Set 3 includes cells, which combine the outputs obtained from sets 1 and 2. These results are further used for the activation or deactivation of the cells at specific bitwise positions. Therefore, only one cell has switching activity in set 3, resulting in a significant reduction in the power dissipation.
- The single-active logic cell of set 3 further triggers subsequent logic cell present in set 4. Thus, only one cell of set 4 will be active, which leads to an additional reduction in the power dissipation.

- Power saving is not achieved in set 1 because of all the transistors participate in switching activities. From Table 3, it can be seen that for 16 bit comparison 112 transistors participate in switching activities so set 1 contributes to 30.1% (112 transistors out of total 372 transistors) in total switching activities of 16 bit comparison.
- Since the worst-case operands have 12 most significant equal bit pairs from  $A1_{15}B1_{15}$  to  $A1_4B1_4$  = '1111 1111 1111' (or  $A2_{15}B2_{15}$  to  $A2_4B2_4$  = '1111 1111 1111'), all logic cells in set 2 will be activated. Table 3 shows set 2 uses only 36 transistors out of total 372 transistors; therefore, set 2 contributes to 9.67% in total switching activities of 16 bit comparison.
- The worst-case operands have 15 equal bit pairs  $A1_{15}B1_{15}$  to  $A1_{1}B1_{1}$  = '1111 1111 1111 1111 ' (or  $A2_{15}B2_{15}$  to  $A2_{1}B2_{1}$  = '1111 1111 1111 1111 1111 in and least significant unequal bit pair  $A1_{0}B1_{0}$  = '10' (or  $A2_{0}B2_{0}$  = '01'), which result in activation of only one logic cell of set 3. Table 3 shows one logic cell in set 3 contains only 12 transistors. Hence, due to single-activated logic cell, set 3 contributes only 3.2% (12 transistors out of total 372 transistors) of total switching activities of 16 bit comparison. Although, the present share of active transistors will be decreased as the comparator's bitwidth increases further.
- The single-active logic cell of set 3 further triggers subsequent logic cell present in set 4. Table 3 shows one logic cell in set 4 contains only eight transistors; therefore, set 4 contributes only 2.15% (8 transistors out of total 372 transistors) of total switching activities for 16 bit comparison but this share will also be decreased as the comparator's bitwidth increases further.

Hence, only 168 transistors are activated out of 372 transistors of CEM (i.e. only 45.12%) for 16 bit comparison, and therefore, it can be concluded that the power reduction methodology of the proposed comparator offers low-power consumption.

#### 4.3 Operating speed

The critical path delay of the proposed comparator is evaluated by applying two N-bit operands. Critical delay is the summation of all the cell delays that come across in the critical path. In the proposed comparator, it originates from the logic cell present in the set 1 of CEM to the second NOR logic cell present in the set 5 of FM. The total encountered delay in the critical path would become the applicable minimum time period of the input, which further decides the maximum operating frequency of the proposed design. Total critical path delay of the CEM ( $D_{\rm CEM}$ ) can be described using the mathematical expression as

$$D_{\text{CEM}} = D_{\text{set } 1} + D_{\text{set } 2} + D_{\text{set } 3} + D_{\text{set } 4}$$
 (16)

The three terms  $D_{\text{set 1}}$ ,  $D_{\text{set 3}}$  and  $D_{\text{set 4}}$  mentioned in (16) are equal to the single-activated cell delay ( $D_{\text{U}}$ ), whereas  $D_{\text{set 2}}$  is equal to (N/4)  $D_{\text{U}}$ . Therefore, (16) is modified as



**Fig. 10** Transient responses of the proposed comparator (a) 8 bit, (b) 16 bit, (c) 32 bit, (d) 64 bit

**Table 5** Maximum delay of the proposed comparator (for different bitwidths)

| Operand bitwidths used for evaluation, bit | Maximum delay, ns |
|--------------------------------------------|-------------------|
| 4                                          | 0.2957            |
| 8                                          | 0.3184            |
| 16                                         | 0.376             |
| 32                                         | 0.476             |
| 64                                         | 0.573             |







Fig. 11 Transient responses of the proposed comparator for 16 bit comparison

(a) Set 2, (b) Set 4, (c) Set 5

where N represents the operand's bitwidth.

The delay due to FM ( $D_{FM}$ ) can be written as

$$D_{\rm FM} = 2D_{\rm U} \tag{18}$$



Fig. 12 Critical path delay and worst power dissipation of proposed digital comparator

(a) Critical path delay versus operand bitwidths, (b) Worst power dissipation versus number of bits required for evaluation

The total delay of the proposed comparator evaluated from input to output is given as

$$D_{\rm T} = D_{\rm CEM} + D_{\rm FM} \tag{19}$$

Using (17) and (18) in (19), we get

$$D_{\rm T} = 5D_{\rm U} + \left(\frac{N}{4}\right)D_{\rm U} \tag{20}$$

From (20), it is evident that the proposed comparator has a minimum delay than the similar comparator structures reported in the literature.

Worst-case cell activities of the proposed comparator occur when the operands entitled with values A = 0000...01 and B = 0000...00 (or vice versa). The graphical plot between number of transistors and their corresponding comparator bitwidth is shown in Fig. 8. The total number of transistors used in the proposed comparator is shown by the first red colour bar and the total number of active transistors required for achieving the comparison outcomes is represented by the second green colour bar. It has been noted that less than 50% of the total number of transistors are only active for each bitwidth of the proposed digital comparator. Therefore, the proposed design provides the optimum solution in terms of number of active transistors, scalability, power consumption and operating speed.

#### 5 Results and discussion

The proposed *N*-bit digital comparator has been designed and simulated using Cadence Virtuoso Design Environment with 0.18 µm CMOS technology. For the realisation of the worst-case delay of the proposed digital comparator, we have applied the input operands that would activate the maximum number of cells in all the sets.

The proposed comparator uses the minimum number of transistors for 64 bit comparison and Fig. 9 shows the comparison

in terms of the number of transistors used in the proposed design with the other comparator structures (of same technology node) reported in the literature.

**Table 6** Operand bits required for evaluation of the 64 bit comparison

| Operand bits used for evaluation, bit | Operands (64 bit)        |
|---------------------------------------|--------------------------|
| 4                                     | A = 1111,1111,1111       |
|                                       | B = 1110,11111111        |
| 8                                     | <i>A</i> = 1111,11111111 |
|                                       | B = 1111,11101111        |
| 1                                     | į.                       |
| 64                                    | <i>A</i> = 1111,11111111 |
|                                       | B = 1111,11111110        |

The transient responses of the comparator for various bitwidths obtained for worst-case operands (listed in Table 4) are shown in Figs. 10*a*–*d*. The maximum delays for 4, 8, 16, 32 and 64 bit comparison are listed in Table 5.

To explain the intermediate operations of different sets, two input operands A = 1010101010101010 and B = 1001100110011001 are chosen for 16 bit comparison (as discussed in Section 3). The transient responses of set 2, set 4 and set 5 (for 16 bit comparison) are shown in Figs. 11a-c. The outputs of set 1 and set 2 are not included because set 1 contains only proposed EX–OR–NOR cells and set 3 contains only NAND gate.

Fig. 12a shows the maximum input—output delay versus comparator bitwidths. From this figure, it can be seen that the proposed comparator has a maximum input—output delay of 0.57 ns in the worst-case scenario for 64 bit comparison. Therefore, the proposed comparator has the maximum operating speed of 1.75

Table 7 Comparison between the proposed comparator and existing comparators reported in the literature

| Comparator structures                       | Technology/power supply | Transistor count/ comparator            | Power dissipation                           | Delay                            | Remarks                                                                                                                                                                                                                       |
|---------------------------------------------|-------------------------|-----------------------------------------|---------------------------------------------|----------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                             |                         | bitwidth                                |                                             |                                  |                                                                                                                                                                                                                               |
| proposed (static type)                      | 0.18 μm/1.8 V           | 1586/64 bit<br>606/24 bit<br>802/32 bit | 1.03 mW at 1 GHz (64<br>bit) 1.03 μW/MHz    | 0.57 ns (64 bit) (9.5<br>FO4)    | <ul> <li>(1) low-power dissipation</li> <li>(2) high operating speed</li> <li>(3) low transistor count</li> <li>(4) low FO4 delay (in comparison with existing comparators designed using 0.18 μm CMOS technology)</li> </ul> |
| Hafeez et al. [36] (static type)            | 0.15 μm/1.5 V           | 4000/64 bit                             | 7.76 mW at 1 GHz (64<br>bit)                | 0.86 ns (64 bit)<br>(17.2 FO4)   | <ul><li>(1) high transistor count</li><li>(2) limited power efficiency</li></ul>                                                                                                                                              |
| Hensley et al. [32] (static type)           | 0.18 μm/1.8 V           | 624/24 bit                              | 5.23 mW at 100 MHz (24<br>bit) 0.735 μW/MHz | 4.16 ns (24 bit)<br>(69.3 FO4)   | <ul><li>(1) very slow</li><li>(2) transistor count, power dissipation and delay for 24 bit comparison are reported</li></ul>                                                                                                  |
| Perri and Corsonello [28] (static type)     | 0.35 μm/3.3 V           | 1051                                    | 38 μW/MHz (64 bit)                          | 1.37 ns (64 bit) (9.4<br>FO4)    | (1) limited power efficiency                                                                                                                                                                                                  |
|                                             | 90 nm/1 V               | 1051                                    | 1 μW/MHz (64 bit)                           | 0.23 ns (64 bit) (8.6<br>FO4)    | (2) comparable FO4 delay value                                                                                                                                                                                                |
| Frustaci <i>et al.</i> [26] (static type)   | 90 nm/1 V               | 1359                                    | 0.77 μW/MHz                                 | 0.22 ns (64 bit) (8.3<br>FO4)    | (1) comparable power dissipation value                                                                                                                                                                                        |
| Lam and Tsui [34]                           | 0.35 μm/3.3 V           | 3386/64 bit                             | 14.2 mW at 200 MHz                          | 2.82 ns (64 bit)<br>(19.4 FO4)   | (1) heavy clock loading<br>along with a substantial<br>number of gated<br>transistors                                                                                                                                         |
|                                             |                         |                                         | 42 μW/MHz                                   |                                  | (2) limited power efficiency                                                                                                                                                                                                  |
| Kim and Yoo [35]                            | 0.18 μm/1.8 V           | 964/32 bit                              | 2.53 mW at 200 MHz                          | 1.12 ns (32 bit)<br>(18.6 FO4)   | (1) heavy loading of the dynamic clock with the gated number of transistors                                                                                                                                                   |
|                                             |                         |                                         | 12.65 μW/MHz                                |                                  | <ul> <li>(2) limited operating speed</li> <li>(3) transistor count, power dissipation and delay for 32 bit comparison is reported</li> </ul>                                                                                  |
| Boppana and Ren [37]                        | 90 nm/1.2 V             | _                                       | 0.898 mW (64 bit)                           | 0.858 ns (64 bit)<br>(32.01 FO4) | (1) area extensive design for wide operands                                                                                                                                                                                   |
|                                             |                         |                                         |                                             |                                  | (2) restricted operating speed                                                                                                                                                                                                |
| Chua et al. [38] and Chua<br>and Kumar [39] | 0.18 μm/1.8 V           | 1875/64 bit                             | 3.8 mW (64 bit)                             | 0.88 ns (64 bit)<br>(14.66 FO4)  | (1) area extensive design in terms of number of transistors                                                                                                                                                                   |
|                                             |                         |                                         |                                             |                                  | (2) high-power dissipation                                                                                                                                                                                                    |
| Ondones [44]                                | 0.05 (0.03)             | 0450/041.                               | 47.54\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\      | 4.00 (0.4.1.11)                  | (3) limited operating speed                                                                                                                                                                                                   |
| Cadence [41]                                | 0.35 μm/3.3 V           | 2456/64 bit                             | 17.54 mW at 200 MHz<br>34 μW/MHz            | 1.93 ns (64 bit)<br>(13.3 FO4)   | (1) high-power dissipation in tree structure                                                                                                                                                                                  |

GHz, which makes it as fastest comparator among the existing comparators designed using 0.18 µm CMOS technology.

Fig. 12b is used to show the worst-case power dissipation versus number of bits required for evaluation. For this plot, we have changed the operand bits at the particular positions such as 4th, 8th, 16th, 32nd and 64th, which are given in Table 6.

For instance, in the comparison between two operands having values 1111...11 and 1110...11, only 4 bit are required to declare the result of the comparison. From this figure, it is observed that the worst power dissipation of the proposed comparator is 1.03 mW, which is lower than that of the reported comparator structures designed using 0.18 µm CMOS technology.

Various state-of-the-art implementations of the digital comparators based on the recently reported topologies have been compared with the proposed digital comparator and are listed in Table 7. The delay of the proposed comparator and reported comparators are normalised in terms of fan-out-of-4 (FO4) minimum-sized inverter delay in the used technology. The proposed comparator has minimum FO4 delay of 9.5 ns as compared the FO4 delay values of comparators [32, 35, 38, 39] designed using 0.18 µm CMOS technology. From this table, it can be seen that the comparator structures of [38, 39] have high-power dissipation of 3.8 mW and large delay of 0.88 ns as compared with 1.03 mW and 0.57 ns of the proposed comparator. The proposed comparator also offers an additional advantage of minimum area in terms of number of transistors i.e. 1586 for 64 bit comparison, 802 for 32 bit comparison and 606 for 24 bit comparison as compared with 1875 for 64 bit comparison [38, 39], 964 for 32 bit comparison [35] and 624 for 24 bit comparison [32] designed using 0.18 µm CMOS technology.

## 6 Conclusion

In this paper, a novel scalable comparator using CEM and FM structures is proposed. The CEM comprises of the regular structure of repeated logic cells used for implementing parallel prefix tree structure. This regular structure can be used to predict the characteristics of the proposed comparator for arbitrary bitwidths. The proposed comparator has a maximum operating frequency, low-power dissipation and minimum FO4 delay as compared with existing comparators designed using 0.18 µm CMOS technology. These advantages of the proposed comparator make it suitable for various applications such as scientific computations, test circuits, memory addressing logic etc.

### References

- [1] Parhami, B.: 'Efficient hamming weight comparators for binary vectors based on accumulative and up/down parallel counters', IEEE Trans. Circuits Syst., 2009, **56**, (2), pp. 167-171
- [2] Liu, H.J.R., Yao, H.: 'High-performance VLSI signal processing innovative architectures and algorithms' (IEEE Press, Piscataway, NJ, 1998)
  Sheng, Y., Wang, W.: 'Design and implementation of compression algorithm
- [3] comparator for digital image processing on component'. Proc. Ninth Int. Conf. Young Computer Scientists, Hunan, China, November 2008, pp. 1337–
- Abramovici, M., Breuer, M.A., Friedman, A.D., et al.: 'Digital systems testing and testable design' (IEEE Press, Piscataway, NJ, 1990)
  Chan, A., Roberts, G.W.: 'A jitter characterization system using a component-invariant Vernier delay line', IEEE Trans. Very Large Scale Integr. (VLSI) [4]
- [5] Syst., 2004, 12, (1), pp. 79-95
- Oklobdzija, V.G.: 'An algorithmic and novel design of a leading zero detector [6] circuit: comparison with logic synthesis', IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 1994, 2, (1), pp. 124-128
- Suzuki, H., Kim, C.H., Roy, K., et al.: 'Fast tag comparator using diode partitioned domino for 64 bit microprocessor', IEEE Trans. Circuits Syst. I, 2007, **54**, (2), pp. 322–328
- Ponomarev, D., Kucuk, G., Ergin, O., et al.: 'Energy-efficient comparators for [8] superscalar datapaths', IEEE Trans. Comput., 2004, 53, (7), pp. 892-904
- [9] Guangjie, W., Shimin, S., Lijiu, J., et al.: 'New efficient design of digital comparator'. Proc. Second Int. Conf. Applications Specific Integrated Circuits, Shanghai, China, 1996, pp. 263-266

- **[10]** Norris, D.: 'Comparator circuit', U.S. Patent 5534844, April 1995
- Г111 SN7485: 4 bit Magnitude Comparators, Texas Instruments, Dallas, TX, 1999
- [12] Glass, K.W.: 'Digital comparator circuit', U.S. Patent 5260680, February
- [13] Helms, H.L.: 'High-speed (HC/HCT) CMOS guide' (Prentice-Hall, Englewood Cliffs, NJ, 1989)
- Uyemura, J.P.: 'CMOS logic circuit design' (Kluwer, Norwood, MA, 1999)
- [15] Abdel-Hafeez, S.: 'Single rail domino logic for four-phase clocking scheme', U.S. Patent 6265899, October 2001
- Ercegovac, M.D., Lang, T.: 'Digital arithmetic' (Morgan Kaufmann, San Mateo, CA, 2004) [16]
- [17] Stine, J.E., Schulte, M.J.: 'A combined two's complement and floating-point comparator'. Proc. Int. Symp. Circuits Systems, 2005, pp. 89-92
- [18] Cheng, S.: 'A high-speed magnitude comparator with small transistor count'. Proc. IEEE Int. Conf. Electronics, Circuits, Systems, Sharjah, United Arab Emirates, December 2003, pp. 1168–1171
- [19] Bellaour, A., Elmasry, M.I.: 'Low-power digital VLSI design circuits and systems' (Kluwer, Norwood, MA, 1995)
  Wang, C.C., Wu, C.F., Tsai, K.C., et al.: '1 GHz 64 bit high-speed comparator
- using ANT dynamic logic with two-phase clocking', IEE Proc.-Comput. Digit. Tech., 1998, 145, (6), pp. 433-436
- Belluomini, W., Jamsek, D., Nartin, A.K., et al.: 'Limited switch dynamic [21] logic circuits for high-speed low-power circuit design', IBM J. Res. Dev., 2006, **50**, (2–3), pp. 277–286 Wang, C., Lee, P., Wu, C., *et al.*: 'High fan-in dynamic CMOS comparators
- with low transistor count', IEEE Trans. Circuits Syst. I, 2003, 50, (9), pp. 1216-1220
- Huang, C.H., Wang, J.S.: 'High-performance and power-efficient CMOS [23]
- comparators', *IEEE J. Solid-State Circuits*, 2003, **38**, (2), pp. 254–262 Lam, H.M., Tsui, C.Y.: 'A mux-based high-performance single-cycle CMOS [24] comparator', IEEE Trans. Circuits Syst. II, 2007, 54, (7), pp. 591-595
- Maheshwari, N., Sapatnekar, S.S.: 'Optimizing large multiphase level-clocked circuits', IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1999, 18, (9), pp. 1249-1264
- Frustaci, F., Perri, S., Lanuzza, M., et al.: 'Energy-efficient single-clock-cycle [26]
- binary comparator', Int. J. Circuit Theory Appl., 2012, 40, (3), pp. 237–246 Coussy, P., Morawiec, A.: 'High-level synthesis: from algorithm to digital circuit' (Springer-Verlag, New York, 2008)
  Perri, S., Corsonello, P.: 'Fast low-cost implementation of single-clock-cycle
- binary comparator', IEEE Trans. Circuits Syst. II, 2008, 55, (12), pp. 1239-1243
- Lutz, D.R., Jayasimha, D.N.: 'The half-adder form and early branch condition [29] resolution'. Proc. 13th IEEE Symp. Computer Arithmetic, Asilomar, CA, USA, July 1997, pp. 266–273
- Ercegovac, M.D., Lang, T.: 'Sign detection and comparison networks with a small number of transitions'. Proc. 12th IEEE Symp. Computer Arithmetic, Bath, UK, July 1995, pp. 59–66 Bruguera, J.D., Lang, T.: 'Multilevel reverse most-significant carry
- computation', IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 2001, 9, (6), pp. 959–962 Hensley, J.,
- Singh, M., Lastra, A., et al.: 'A fast, energy-efficient z-[32] comparator'. Proc. ACM Conf. Graphics Hardware, Los Angeles, California,
- Ekanayake, V.N., Clinton, I.K., Manohar, R., et al.: 'Dynamic significance compression for a low-energy sensor network asynchronous processor'. Proc. 11th IEEE Int. Symp. Asynchronous Circuits Systems, New York City, NY,
- USA, March 2005, pp. 144–154 Lam, H.M., Tsui, C.Y.: 'High-performance single clock cycle CMOS [34] comparator', Electron. Lett., 2006, 42, (2), pp. 75-77
- Kim, J.Y., Yoo, H.J.: 'Bitwise competition logic for compact digital comparator'. Proc. IEEE Asian Solid-State Circuits Conf., Jeju, South Korea, November 2007, pp. 59-62
- Hafeez, A., Ross, A., Parhami, B., et al.: 'Scalable digital CMOS comparator [36] using a parallel prefix tree', *IEEE Trans. Very Large Scale Integr. (VLSI)* Syst., 2013, **21**, (11), pp. 1989–1998
- Boppana, N., Ren, S.: 'A low-power and area-efficient 64 bit digital comparator', J. Circuits Syst. Comput., 2016, 25, (2), pp. 1650148–1650163
- Chua, C., Kumar, R., Sireesha, B., et al.: 'Design and analysis of low-power and area-efficient N-bit parallel binary comparator', Analog Integr. Circuits
- Signal Process., 2017, 92, (2), pp. 225–231 Chua, C., Kumar, R.: 'An improved design and simulation of low-power and [39] area-efficient parallel binary comparator', Microelectron. J., 2017, 66, pp. 84-
- Cheng, K.H., Huang, C.S.: 'The novel efficient design of XOR/XNOR function for adder applications'. Proc. IEEE Int. Conf. Electronics, Circuits and Systems, Pafos, Cyprus, September 1999, pp. 29-32
- 'Cadence online documentation'. Available at http://www.cadence.com, accessed 2010